MARK-AGE data management: Cleaning, exploration and visualization of data

نویسندگان

  • Jennifer Baur
  • Maria Moreno-Villanueva
  • Tobias Kötter
  • Thilo Sindlinger
  • Alexander Bürkle
  • Michael R. Berthold
  • Michael Junk
چکیده

Databases are an organized collection of data and necessary to investigate a wide spectrum of research questions. For data evaluation analyzers should be aware of possible data quality problems that can compromise results validity. Therefore data cleaning is an essential part of the data management process, which deals with the identification and correction of errors in order to improve data quality. In our cross-sectional study, biomarkers of ageing, analytical, anthropometric and demographic data from about 3000 volunteers have been collected in the MARK-AGE database. Although several preventive strategies were applied before data entry, errors like miscoding, missing values, batch problems etc., could not be avoided completely. Such errors can result in misleading information and affect the validity of the performed data analysis. Here we present an overview of the methods we applied for dealing with errors in the MARK-AGE database. We especially describe our strategies for the detection of missing values, outliers and batch effects and explain how they can be handled to improve data quality. Finally we report about the tools used for data exploration and data sharing between MARK-AGE collaborators.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Interactive Visualization Environment for Data Exploration

Exploratory data analysis is a process of sifting through data in search of interesting information or patterns. Analysts' current tools for exploring data include database management systems, statistical analysis packages, data mining tools, visualization tools, and report generators. Since the exploration process seeks the unexpected in a data-driven manner, it is crucial that these tools are...

متن کامل

A new approach for data visualization problem

Data visualization is the process of transforming data, information, and knowledge into visual form, making use of humans’ natural visual capabilities which reveals relationships in data sets that are not evident from the raw data, by using mathematical techniques to reduce the number of dimensions in the data set while preserving the relevant inherent properties. In this paper, we formulated d...

متن کامل

Visualization of Computer Architecture Simulation Data for System-Level Design Space Exploration

System-level computer architecture simulations create large volumes of simulation data to explore alternative architectural solutions. Interpreting and drawing conclusions from this amount of simulation results can be extremely cumbersome. In other domains that also struggle with interpreting large volumes of data, such as scientific computing, data visualization is an invaluable tool. Such vis...

متن کامل

Experiences with using Data Cleaning Technology for Bing Services

Over the past few years, our Data Management, Exploration and Mining (DMX) group at Microsoft Research has worked closely with the Bing team to address challenging data cleaning and approximate matching problems. In this article we describe some of the key Big Data challenges in the context of these Bing services primarily focusing on two key services: Bing Maps and Bing Shopping. We describe i...

متن کامل

Visualization in Radiation Oncology: Towards Replacing the Laboratory Notebook

Data exploration in radiation oncology requires the creation of a large number of visualizations. For treatment planning, detailed information about the processes used to manipulate data collected and to create visualizations is needed for assessing the quality of the results. Current visualization systems allow the interactive creation and manipulation of complex visualizations. However, they ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Mechanisms of Ageing and Development

دوره 151  شماره 

صفحات  -

تاریخ انتشار 2015